Cost Sensitive Discretization of

نویسندگان

  • Tom Brijs
  • Koen Vanhoof
چکیده

Many algorithms in decision tree learning are not designed to handle numeric valued attributes very well. Therefore, discretization of the continuous feature space has to be carried out. In this article we introduce the concept of cost sensitive discretization as a preprocessing step to induction of a classifier and as an elaboration of the error-based discretization method to obtain an optimal multi-interval splitting for each numeric attribute. A transparant description of the method and steps involved in cost sensitive discretization is given. We also evaluate its performance against two other well known methods, i.e. entropy-based discretization and pure error-based discretization on a real life financial dataset. From the algoritmic point of view, we show that an important deficiency from error-based discretization methods can be solved by introducing costs. From the application point of view, we discovered that using a discretization method is recommended. To conclude, we use ROC-curves to illustrate that under particular conditions cost-based discretization may be optimal.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cost Sensitive Discretization of Numeric Attributes

Many algorithms in decision tree learning have not been designed to handle numerically-valued attributes very well. Therefore, discretization of the continuous feature space has to be carried out. In this article we introduce the concept of cost-sensitive discretization as a preprocessing step to induction of a classifier and as an elaboration of the error-based discretization method to obtain ...

متن کامل

To Express Required CT-Scan Resolution for Porosity and Saturation Calculations in Terms of Average Grain Sizes

Despite advancements in specifying 3D internal microstructure of reservoir rocks, identifying some sensitive phenomenons are still problematic particularly due to image resolution limitation. Discretization study on such CT-scan data always has encountered with such conflicts that the original data do not fully describe the real porous media. As an alternative attractive approach, one can recon...

متن کامل

A New Formulation for Cost-Sensitive Two Group Support Vector Machine with Multiple Error Rate

Support vector machine (SVM) is a popular classification technique which classifies data using a max-margin separator hyperplane. The normal vector and bias of the mentioned hyperplane is determined by solving a quadratic model implies that SVM training confronts by an optimization problem. Among of the extensions of SVM, cost-sensitive scheme refers to a model with multiple costs which conside...

متن کامل

Proposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms

In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...

متن کامل

A particle swarm optimization algorithm for minimization analysis of cost-sensitive attack graphs

To prevent an exploit, the security analyst must implement a suitable countermeasure. In this paper, we consider cost-sensitive attack graphs (CAGs) for network vulnerability analysis. In these attack graphs, a weight is assigned to each countermeasure to represent the cost of its implementation. There may be multiple countermeasures with different weights for preventing a single exploit. Also,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001